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Purpose: Next-generation sequencing (NGS) has revolutionized systems-based analysis of cellular pathways. The goals 
of this study are to compare NGS-derived retinal transcriptome profiling (RNA-seq) to microarray and quantitative reverse 
transcription polymerase chain reaction (qRT-PCR) methods and to evaluate protocols for optimal high-throughput data 
analysis. 

Methods: Retinal mRNA profiles of 21-day-old wild-type (WT) and neural retina leucine zipper knockout (Nrl~ / ~~) mice 
were generated by deep sequencing, in triplicate, using Illumina GAIIx. The sequence reads that passed quality filters 
were analyzed at the transcript isoform level with two methods: Burrows- Wheeler Aligner (BWA) followed by ANOVA 
(ANOVA) and TopHat followed by Cufflinks. qRT-PCR validation was performed using TaqMan and SYBR Green 
assays. 

Results: Using an optimized data analysis workflow, we mapped about 30 million sequence reads per sample to the mouse 
genome (build mm9) and identified 16,014 transcripts in the retinas of WT and Nrl~ h mice with BWA workflow and 
34,1 15 transcripts with TopHat workflow. RNA-seq data confirmed stable expression of 25 known housekeeping genes, 
and 12 of these were validated with qRT-PCR. RNA-seq data had a linear relationship with qRT-PCR for more than four 
orders of magnitude and a goodness of fit (R 2 ) of 0.8798. Approximately 10% of the transcripts showed differential 
expression between the WT and Nrl~'~ retina, with a fold change > 1.5 and p value <0.05. Altered expression of 25 genes 
was confirmed with qRT-PCR, demonstrating the high degree of sensitivity of the RNA-seq method. Hierarchical 
clustering of differentially expressed genes uncovered several as yet uncharacterized genes that may contribute to retinal 
function. Data analysis with BWA and TopHat workflows revealed a significant overlap yet provided complementary 
insights in transcriptome profiling. 

Conclusions: Our study represents the first detailed analysis of retinal transcriptomes, with biologic replicates, generated 
by RNA-seq technology. The optimized data analysis workflows reported here should provide a framework for 
comparative investigations of expression profiles. Our results show that NGS offers a comprehensive and more accurate 
quantitative and qualitative evaluation of mRNA content within a cell or tissue. We conclude that RNA-seq based 
transcriptome characterization would expedite genetic network analyses and permit the dissection of complex biologic 
functions. 



Next-generation sequencing (NGS) technology has 
launched a new era of enormous potential and applications in 
genomic and transcriptomic analyses [1-3]. With continued 
cost reductions and improved analytical methods, NGS has 
begun to have a direct impact on biomedical discovery and 
clinical outcome [4-6]. NGS has enabled "meta-genomic" 
studies to survey the genomes of organisms in a particular 
ecosystem [7], and decode the entire genomes of species 
ranging from bacteria [8,9] and viruses [10] to humans [11]. 
Whole -genome sequencing has made it possible to investigate 
genetic variations [12], global DNA methylation [13], and in 
vivo analysis of targets of DNA -binding proteins [14,15]. 
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Deep sequencing of RNA with NGS (called "RNA-seq") 
allows a comprehensive evaluation and quantification of all 
subtypes of RNA molecules expressed in a cell or tissue 
[16]. RNA-seq technology can detect transcripts expressed at 
low levels [17] and permit the identification of unannotated 
transcripts and new spliced iso forms [16,18]. The issues 
related to cross-hybridization and detection levels that limit 
the accuracy of gene expression estimates by microarray 
technology are not relevant to the data obtained with RNA- 
seq [19]. Visualization of mapped sequence reads spanning 
the splice junctions can also reveal novel splice forms of 
annotated genes in the mouse retina, which was not possible 
with earlier hybridization-based technologies. With a steady 
reduction in the costs of NGS, RNA-seq is now emerging as 
a method of choice for comprehensive transcriptome 
profiling. 
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The vertebrate retina exhibits a highly organized laminar 
structure that captures, integrates, and transmits visual 
information to the brain for further processing. Photoreceptors 
constitute more than 70% of the retinal cells and convert light 
into electrical signals [20]. Rod photoreceptors mediate dim 
light vision and can detect a single photon of light, while cone 
photoreceptors are responsible for daylight vision, color 
perception, and visual acuity [21,22]. Impairment of 
photoreceptor function leads to retinal degeneration with a 
more common pattern of rod death preceding the death of 
cones [23-25]. The neural retina leucine zipper (AW) gene 
encodes a basic-motif leucine zipper transcription factor 
necessary for determining rod photoreceptor cell fate and 
functional maintenance [26]. The AW 7 ~ mouse, generated by 
creating a loss of function mutation in AW, has a cone-only 
retina that serves as a useful model for studies of cone biology 
[26-28]. 

Several previous investigations have elucidated the gene 
expression landscape specific to whole retina or retinal cell 
types and during development or aging. Serial analysis of gene 
expression [29-31] and cDNA eye gene arrays [32-36] were 
initially used to determine signatures of retinal gene 
expression. Oligonucleotide microarrays have since allowed 
a more comprehensive approach to expression profiling 
[37-41]. Microarray analyses of flow-sorted photoreceptors 
and single cells from dissociated retinas [42-44] have begun 
to reveal new insights into regulatory networks. Application 
of NGS technology greatly expands the power of expression 
profiling by identifying all transcripts and spliced isoforms in 
the tissue or cell type of interest. 

Here, we have used the power of NGS-based RNA-seq 
analysis to investigate in depth the transcriptome of wild-type 
(WT) and AW~ /_ retinas and identified a set of differentially 
expressed genes and spliced isoforms. We have also taken 
advantage of the knowledge about AW~ /_ mice to optimize 
workflows for data analysis and compared our results with 
those obtained with microarray methods and quantitative 
reverse transcription polymerase chain reaction (qRT-PCR) 
analysis. Our studies illustrate that RNA-seq offers a more 
complete, accurate, and relatively faster approach for 
comparative and comprehensive analysis of retinal 
transcriptomes and for discovering novel transcribed 
sequences. Our validated data analysis workflow should also 
be beneficial for similar studies by other investigators. Raw 
data and workflow are available on the N-NRL/NEI website. 

METHODS 

Animals and tissue collection: All investigations on mice were 
approved by the Animal Care and Use Committee of the 
National Eye Institute and followed the tenets of the 
Declaration of Helsinki. C57B1/6J (referred to as wild type, 
WT) and Nrl (on C57B1/6J background [26]) mice were 
euthanized with CO2 inhalation. The retinas were excised 
rapidly, frozen on dry ice, and stored at -80 °C. 
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RNA isolation: Fresh frozen mouse retinas were lysed with a 
mortar and pestle in TRIzol Reagent, and total RNA was 
isolated according to the manufacturer's protocol (Invitrogen, 
Carlsbad, CA). RNA quality and quantity were assessed with 
the RNA 6000 Nano Kit (Agilent, Santa Clara, CA). 

NGS library construction: Whole retinal RNA samples were 
independently processed from three wild-type and three Nrl 
~ h mice at P21. Total RNA (1 u.g) was used with the TruSeq 
mRNA-seq Sample Preparation Kit (Illumina, San Diego, 
CA) to construct cDNA libraries. The quality of the libraries 
was verified using the DNA-1000 Kit (Agilent) and 
quantitation performed with qRT-PCR using ABI 7900HT 
(Life Technologies, Carlsbad, CA), as suggested in the 
Sequencing Library qRT-PCR Quantification Guide 
(Illumina). Gene Expression Master Mix (Life Technologies) 
was used for the qRT-PCR reactions, and a titration of phiX 
control libraries was employed as the quantification standard. 

Illumina sequencing: Each cDNA library (10 pM) was 
independently loaded into one flow cell lane, and single-read 
cluster generation proceeded using the TruSeq SR Cluster 
Generation Kit v5 (Illumina). Sequencing-by-synthesis (SBS) 
of 70-nucleotide length was performed on a Genome Analyzer 
IIx running SCS2.8 software using SBS v4 reagents 
(Illumina). Base calling and chastity filtering were performed 
using RTA (real-time analysis with SCS2.8). 

Burrows— Wheeler transform-based short read aligner 
analysis workflow: Burrows-Wheeler Transform Aligner 
(BWA) [45] was used to align RNA-seq reads against the 
mouse reference genome (build mm9), downloaded and 
indexed from the University of California Santa Cruz 
(UCSC) genome browser database [46]. The resulting 
sequence alignment/map files were imported into Partek 
Genomics Suite (Partek Inc., St. Louis, MO) to compute raw 
and fragments per kilobase of exon model per million mapped 
(FPKM) reads normalized expression values of the transcript 
isoforms defined in the UCSC refFlat file. A stringent filtering 
criterion of FPKM value 1.0 (equivalent to one transcript per 
cell [16]) in at least one out of six samples was used to obtain 
expressed transcripts. The FPKM values of the filtered 
transcripts were log-transformed using log2 (FPKM+offset) 
with an offset=1.0. ANOVA (ANOVA) was then performed 
on the log-transformed data of the two groups (WT and Nrl~' 
") to generate fold change and p values for each transcript. Y- 
chromosome transcripts were filtered out along with non- 
coding (nc) RNAs, mitochondrial DNA coded genes, 
pseudogenes, and predicted protein-coding genes. 
Differentially expressed mRNA isoforms were filtered for a 
fold change cutoff of 1.5 and p-value cutoff of 0.05. These 
criteria were implemented to enable a comparison with 
previous expression studies. Hierarchical clustering was 
performed using Cluster 3.0 software [47]. We used 
uncentered correlation as the distance metric. Heatmaps and 
dendrograms were generated using JavaTreeView software 
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Table 1. TaqMan assays employed for qRT-PCR validation 



TaqMan assay ID 


Gene symbol 


Gene name 


Mm00607939 si 


Actb 


actin, b 


Mm00504628_ml 


Arr3 


arrestin 3, retinal 


Mm00437764_ml 


B2m 


b-2 microglobulin 


Mm00474799_ml 


Cadm3 


cell adhesion molecule 3 


Mm00432322_ml 


Casp7 


caspase 7 


Mm00833234_ml 


Cngal 


cyclic nucleotide gated channel a 1 


Mm00489232_ml 


Cngb3 


cyclic nucleotide gated channel b 3 


Mm00656724_ml 


Egrl 


early growth response 1 


Mm00442411 ml 


Esrrb 


estrogen related receptor, b 


Mm00438796_ml 


Eyal 


eyes absent 1 homolog (Drosophila) 


Mm00445225_ml 


Fabp7 


fatty acid binding protein 7, brain 


Mm99999915_gl 


Gapdh 


glyceraldehyde-3-phosphate dehydrogenase 


Mm00492388_gl 


Gnatl 


guanine nucleotide binding protein, a transducing 1 


Mm00492394_ml 


Gnat2 


guanine nucleotide binding protein, a transducing 2 


Mm01197698_ml 


Gusb 


glucuronidase, b 


Mm01318747_gl 


Hprtl 


hypoxanthine guanine phosphoribosyl transferase 1 


Mm00833431 gl 


Hsp90abl 


heat shock protein 90 kDa a, class B member 1 


Mm01340839_ml 


Mef2c 


myocyte enhancer factor 2C 


Mm00443299_ml 


Nr2e3 


nuclear receptor subfamily 2, group E, member 3 


Mm00476550_ml 


Nrl 


neural retina leucine zipper gene 


Mm00524018_ml 


Nxnll 


nucleoredoxin-like 1 


Mm00433560_ml 


Opnlmw 


opsin 1 (cone pigments), medium-wave-sensitive 


Mm00432058_ml 


Opnlsw 


opsin 1 (cone pigments), short-wave-sensitive 


Mm00476679_ml 


Pde6b 


phosphodiesterase 6B, cGMP, rod receptor, b 


Mm00473920_ml 


Pde6c 


phosphodiesterase 6C, cGMP specific, cone, a prime 


Mm01225301_ml 


Pgkl 


phosphoglycerate kinase 1 


Mm00519814_ml 


Reep6 


receptor accessory protein 6 


Mm00520345_ml 


Rho 


Rhodopsin 


Mm00524993_ml 


Rorb 


RAR-related orphan receptor b 


Mm01612986 gH 


Rpll3a 


ribosomal protein L13A 


Mm02601831_gl 


Rps26 


ribosomal protein S26 


Mm00774693_gl 


SalB 


sal-like 3 (Drosophila) 


Mm01249143_gl 


Socs3 


suppressor of cytokine signaling 3 


Mm01277045_ml 


Tbp 


TATA box binding protein 


Mm00726185 si 


Tubb4 


tubulin, b 4 


Mm01198158_ml 


Ubc 


ubiquitin C 


Mm00457574_ml 


Wispl 


WNT1 inducible signaling pathway protein 1 



[48]. Aligned reads were visualized using the Integrated 
Genomics Viewer (IGV) [49]. 

TopHat/Cufflinks-based analysis workflow: Raw reads that 
passed the chastity filter threshold were mapped using 
TopHat [50] to identify known and novel splice junctions and 
to generate read alignments for each sample. Genomic 
annotations were obtained from Ensembl in gene transfer 
format (GTF). Splice junctions from the six samples were 
combined into a master junctions file that was used as an input 
file for the second iteration of TopHat mapping. The transcript 
isoform level and gene level counts were calculated and 
FPKM normalized using Cufflinks. An FPKM filtering cutoff 
of 1 .0 in at least one of the six samples was used to determine 
expressed transcripts. Differential transcript expression was 
then computed using Cuffdiff. The resulting lists of 
differentially expressed isoforms were filtered and sorted into 
the following categories: protein coding mRNA transcripts 
and ncRNA transcripts. 

qRT-PCR analysis: Reverse transcription (RT) reactions 
were performed using oligo(dT)20 with Superscript II 



reagents (Life Technologies) according to the manufacturer's 
protocol. cDNA synthesized from 2 |ig of total RNA (1 |ig for 
minus RT controls) was diluted to 100 ul (fivefold dilution), 
and from this 1 (il was used for each qRT-PCR reaction. The 
qRT-PCR reactions were performed in triplicate for TaqMan 
assays or in duplicate for the SYBR assays, using three 
biologic replicates per genotype, on a 7900HT Genetic 
Analyzer (Life Technologies). TaqMan assays were 
performed using TaqMan Gene Expression Master Mix and 
TaqMan Gene Expression Assays (Life Technologies) for 
genes listed in Table 1. The SYBR Green assays (Table 2) 
were performed using Power SYBR Green Master Mix (Life 
Technologies) and oligonucleotides at a final concentration of 
200 nM. Oligonucleotides were designed using the Primer3 
PCR Primer Design Tool [51] and synthesized by Integrated 
DNA Technologies (Coralville, IA). To eliminate 
complications due to contaminating genomic DNA in the 
RNA samples, qRT-PCR reactions were also performed with 
minus-RT control, using hypoxanthine guanine 
phosphoribosyl transferase (Hprt) primer pairs that can 
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Table 2. SYBR Green assays employed for qRT-PCR validation 



Gene symbol 

Abcal3 (Exon 
53/55) 

Abcal3 (Exon 

58/60) 

Acoxl 

Akt3 

Cadm3 

Ccdc24 

Cd8a 

Cox5b 

Ctss 

Drd4 

Dynlt3 

Hr 

K119 

K1M3 

KM33 

Neurodl 

Nipall 

Pip5kla 

Plekhf2 

Rabl8 
Rgs22 
Rpgripl 
Sema7a 

Txnip 
Wispl 
Wscd2 



Gene name 

ATP-binding cassette, sub-family A (ABC1), member 13 



Forward 

GACCTTCTGAGATGGCCAAG 



ATP-binding cassette, sub-family A ( ABC 1 ), member 1 3 CGGTACCTCTGGCAAACAAT 



acyl-CoA oxidase-like 
thymoma viral proto-oncogene 3 
cell adhesion molecule 3 
coiled-coil domain containing 24 
CD8 antigen, alpha chain 
cytochrome c oxidase, subunit Vb 
cathepsin S 
dopamine receptor D4 
dynein light chain Tctex-type 3 
hairless 

Kruppel-like factor 9 
kelch-like 3 
kelch-like 33 

neurogenic differentiation 1 
NIPA-like domain containing 1 

phosphatidylinositol-4-phosphate 5-kinase, type 1 alpha 

pleckstrin homology domain containing, family F (with 

FYVE domain) member 2 

RAB 1 8, member RAS oncogene family 

regulator of G-protein signaling 22 

retinitis pigmentosa GTPase regulator interacting protein 1 

sema domain, immunoglobulin domain (Ig), and GPI 

membrane anchor, (semaphorin) 7A 

thioredoxin interacting protein 

WNT1 inducible signaling pathway protein 1 

WSC domain containing 2 



TGCTGTATGGAACGAAGCTG 

CATCTGAAACAGACACCCGATA 

AGGGATTGTGGCTTTCATTG 

TGTCACATGTTGCAGAACGA 

GACATCTCAGCCCCAGAGAC 

CGTCCATCAGCAACAAGAGA 

TAAAGGGCCTGTCTCTGTGG 

AGACTGCCCACCTCCCTTAC 

TTGATGGAGTTTTGGGTGGT 

GCCCTCTCTGCTCAGCTCTA 

ACAGTGGCTGTGGGAAAGTC 

GAGCACTGGGAGGAGCTATG 

AGCTTCTTCCCTTTGGTGGT 

GCGTTGCCTTAGCACTTCTT 

CCCACAAGAGGGAGAAGTCA 

GGGGAACACAGAGCACAAGT 

GTTGTCGGGTTCGACTGGA 

TGCACGCAAGCATTCTATGT 
GCCCAGAAGATCCTTGAACA 
GCCATGCTACATGCTCAAGA 
TCTACAGCTCCCAACGATCA 

TATGTACGCCCCTGAGTTCC 
GCTCTACCACCTGTGGCCTA 
TCTGCATCAAGACCCATGAA 



Reverse 

TTAACTCCAAGGAGCCCAAA 

GGAAATGGAGCTTCAAGCAG 

TGTGGAATGTTGAAGGCAGA 

GTCCGCTTGCAGAGTAGGAG 

CTAGGGGCTCAGGAGTTGTG 

TCTAAGGCTGGGAATGGATG 

GCTTGCCTTCCTGTCTGACT 

ATAACACAGGGGCTCAGTGG 

GCCATCCGAATGTATCCTTG 

AAGAAAGGCGTCCAACACAC 

GGTACGGTTCTCCCATCTGA 

CGGACCACACCGTCTAAGTT 

CATGCTTGGTGAGATGGTCA 

AGGAGGTTGGTCTGCTGAGA 

CTACAGCCACCGCTGACATA 

AGGAGTGTGTGTTGGCATTT 

GTAAACAGGCTTCCGTTCCA 

GGTCTTCTGAGGCTCACTGC 

TGCGTCTAGTATTCGCCTCAC 

GGCTCTCTTCCCTGTGTGAC 
CGCCTTGTCCTCTTCTGTGT 
TTTGGATGGCCTGGTTTCTA 
GCTCACAGCTCTGTTCCACA 

GTTCCCCGCTGTAGAGACTG 
ACAGCCTGCGAGAGTGAAGT 
ACGGTCTTGCCAAACTTGAG 



Table 3. Summary of Illumina base calling and alignments 



Genotype WT WT WT Nrl'- Nrl'- Nrl'" 

Sample 1 Sample 2 Sample 3 Sample 1 Sample 2 Sample 3 

Total reads 35,872,080 41,785,800 49,076,400 46,689,240 48,480,240 48,656,040 

PF Reads 29,603,280 33,251,160 37,642,800 36,472,800 37,119,960 36,823,320 

82.7% 79.7% 76.9% 78.2% 76.7% 75.8% 

BWA alignments 24,992,271 27,922,997 32,085,799 30,960,565 31,374,578 31,257,335 

TopHat alignments 30,769,939 34,177,120 39,222,596 38,289,469 38,744,790 38,593,533 

Each of the 3 week old WT and Nrl retina sample was sequenced on a separate lane of the Illumina GAIIx flow cell to obtain 
35 to 49 million raw reads. Over 75% of the raw reads passed Illumina's read chastity threshold to yield 29 to 37 million usable 
PF reads. TopHat mapping always gave significantly more alignments than BWA because of its ability to map across the splice 
junctions. A relatively smaller numbers of reads and alignments for WT samples 1 and 2 are not a matter of concern as FPKM 
normalization was used to assess the transcript isoform expression. WT=wild type. PF=pass filter 



differentiate between mRNA and genomic DNA (data not 
shown). Differential expression analysis was performed using 
the ddCt method [52], with the geometric average of actin, 
beta (ActB) and Hprt as the endogenous controls [53]. 

RESULTS 

Sequencing run summary: Six libraries of P21 retinal cDNA 
(three each from WT and Nrl~'~) were sequenced to obtain 35 
to 49 million raw sequence reads per sample (Table 3). Of 
these, 75.8% to 82.7% reads passed the RTA chastity filter 
and were used for subsequent Burrows-Wheeler Aligner 
(BWA) and TopHat analysis workflows (Figure 1). Due to 
TopHat workflow's power to map across splice junctions, the 



workflow consistently yielded 6 to 7 million more alignments 
per sample when compared to BWA. 

BWA workflow: Based on the BWA analysis workflow, 
16,014 transcripts were detected with a normalized FPKM 
value greater than 1 .0 in any of the six samples. Transcripts 
were filtered based on whether they were mRNAs or ncRNAs. 
Of the 15,142 mRNA transcripts, only 1,422 met our criteria 
of differential expression of having a fold change greater than 
1.5 and a p-value less than 0.05 (Table 4). Of the 1,422 
differentially expressed mRNA transcripts (DETs) 
representing 1,218 unique genes, 551 were downregulated in 
Nrl (including rod-specific genes) retinas, and 871 were 
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Figure 1 . Flowchart of RNA-seq data analysis methodology using Burrows-Wheeler Aligner (BWA) and TopHat. Schematic representation 
of two RNA-seq data analysis workflows and resulting views of the data generated. A: BWA workflow: Gapped alignments are performed 
using the BWA algorithm against the mouse reference genome build mm9, and estimation of the expression of genes at the transcript isoform 
level is performed by importing aligned reads into the Partek Genomics Suite using annotations provided by the University of California Santa 
Cruz (UCSC) refflat.txt file. Transcripts expressed at low levels in all samples (<1 fragments per kilobase of exon model per million mapped 
reads [FPKM]) are filtered out. Differential expression analysis was performed by applying the ANOVA (ANOVA) method, and the resulting 
list was sorted and filtered into different transcript groups. Clustering of rod and cone enriched genes was perforated using Cluster 3.0 software 
(see Methods). B: TopHat workflow: Splice junction mapping was performed using the TopHat algorithm in two phases. In the first phase, 
splice junctions were detected de novo from the reads from each sample and combined to obtain a master splice junctions list. In the second 
phase of TopHat alignment, reads from each sample were re-aligned by providing the master junctions list as input. The two-phase mapping 
approach significantly increased the number of alignments spanning the splice junctions. Estimation of gene expression and differential 
expression were computed using Cufflinks, Cuffcompare, and Cuffdiff. Sorting and filtering of transcript isoforms were performed as in the 
BWA workflow. 3038 
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Table 4. Summary of transcript isoforms detected by BWA/ANOVA and TopH at/Cufflinks workflows 

Analysis BWA/ANOVA TopHat/Cufflinks 

Total detected transcripts 16,014 34,115 

mRNA 15,142 32,001 

mRNADETs 1,422 3,258 

The BWA workflow employed refflat.txt annotation for mouse build mm9 from UCSC genome browser. The TopHat workflow 
employed GTF annotation for mouse build mm9 from the Ensembl database. After FPKM filtering (see Materials and Methods), 
transcribed features were classified as protein coding mRNAs and non-coding (nc) RNAs. The features classified as protein- 
coding mRNAs were further filtered based on fold change (>1 .5) and p- value (<0.05) to be considered significantly differentially 
expressed transcripts (DETs). 



upregulated in Nrl ' (including cone-enriched genes and 
those involved in retinal remodeling) retinas. 

TopHat workflow: A total of 34,1 15 transcripts were detected 
with a normalized FPKM value of greater than 1 .0 in any of 
the samples in either group. Transcripts were filtered based 
on whether they were protein-coding mRNAs or ncRNAs. Of 
the 32,001 mRNA transcripts, only 3,258 met our criteria of 
differential expression (Table 4). The DETs represented 1990 
unique genes; 1,504 were downregulated in AW~ 7 ~ (including 
rod-specific genes) retinas, and 1,754 were upregulated in the 
NrF'~ (including cone-enriched genes and those involved in 
retinal remodeling) retinas. 

Comparison of the results from BWA and TopHat analyses: 
The BWA/ANOVA and TopHat/Cufflinks analyses were 
compared to assess the consistency and quality of the results. 
Using the official Mouse Genome Informatics gene symbol 
as the linking term, Venn diagrams were constructed to 
summarize the overlap between the set of all (Figure 2A), the 
top 500 (Figure 2B), and the top 200 (Figure 2C) DETs from 
the BWA workflow and the DET list from the TopHat 
workflow. A comparison of the full list of BWA DETs to the 
TopHat list revealed only 51.7% overlap between the 
differentially expressed genes (DEGs) from BWA and 
TopHat. This overlap increased to 73.8% and 87.8% when 
only the top 500 and 200 DEGs from BWA, respectively, were 
considered. Subsequent analyses were performed using BWA 
data. 

Regression analysis of quantitative expression values 
obtained with RNA-seq and TaqMan qRT-PCR assays: We 
first assessed the correlation between the FPKM values 
(obtained with RNA-seq) with their corresponding qRT-PCR 
crossing threshold (Ct) values from the TaqMan assays; the 
two values represent the quantitative levels of expression of a 
specific transcript in the RNA sample. For this purpose, we 
chose 24 differentially expressed genes (DEGs, reflecting a 
wide range of expression) and 12 housekeeping genes 
(HKGs). The Ct values from three biologic replicates (without 
normalization) were then compared to the corresponding log2 
FPKM values (Figure 3). A least-squares regression analysis 
of DEGs provided an R 2 value of 0.8798, with a corresponding 



slope of -1.056, suggesting a strong inverse relationship 
between Ct and log2 FPKM values over a dynamic range of 
4-5 orders of magnitude. Only one out of 24 genes, cell 
adhesion molecule 3 {CadmS), fell outside this correlation. 
Further investigation of the RNA-seq aligned reads showed 
that our qRT-PCR assay was specific for only one of the two 
retina-expressed spliced isoforms of Cadm3. The reanalysis 
using a S YBR assay designed to detect both Cadm3 transcripts 
confirmed the linear correlation between RNA-seq and qRT- 
PCR analysis. Interestingly, FPKM and Ct values for 6 of the 
12 HKGs did not show the expected linear relationship; these 
included ubiquitin C (Ubc), ActB, ribosomal protein L13A 
(Rpll3a), ribosomal protein S26 (Rps26), phosphoglycerate 
kinase 1 (Pgkl), and most severely glyceraldehyde-3- 
phosphate dehydrogenase (Gapdh). With the exception of 
Ubc that was underestimated by qRT-PCR (in the same 
manner as Cadm3), the BWA workflow underestimated all 
others. 

A comparison of RNA-seq and qRT-PCR data for 
housekeeping genes: RNA-seq data were evaluated for the 
expression of 27 established HKGs (Table 5) included in the 
control qRT-PCR plates from the following vendors: Life 
Technologies (Mouse Endogenous Control Array), SA 
Biosciences, Frederick, MD (Mouse Housekeeping Genes 
RT 2 Profiler PCR Array), and Qiagen, Valencia, CA 
(QuantiTect Housekeeping Genes). Comparison of qRT-PCR 
data for 12 genes (that were tested) showed almost complete 
concordance of expression with the RNA-seq results. Only 
one gene, Ubc, revealed a significant difference in expression 
between the WT and Nrf~ retinas with qRT-PCR (-1.89 
fold) compared to the RNA-seq (—1.19 fold) analyses. 
Gapdh showed a relatively high change in expression in qRT- 
PCR and RNA-seq (-1.49 and -1.37 fold, respectively). 
Hprt and Rpll3a revealed lower variation in qRT-PCR and 
RNA-seq, respectively. Actb, TATA box binding protein 
(Tbp), glucuronidase, beta (Gusb), and Pgkl were among the 
best HKGs for qRT-PCR and RNA-seq normalization. For 
further qRT-PCR analyses, we employed ActB and Hprt in 
all normalization calculations. 
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Figure 2. Venn diagrams comparing 
differentially expressed transcripts 
(DETs) between the Nrt<- and WT 
groups from BWA and TopHat 
analyses. Despite major differences in 
the UCSC refFlat annotations used by 
Burrows- Wheeler Aligner (BWA) and 
Ensembl annotations used by TopHat, 
most of the genes identified by BWA 
were also identified as significant by 
TopHat. A: Comparison of the total 
number of DETs identified as 
significant (fold change >1.5 and p- 
value <0.05) by the two methods. B: 
Inclusion of the top 500 DETs (424 
unique genes) identified as significant 
by BWA and in the full TopHat DET 
list. C: Inclusion of the top 200 DETs 
(179 unique genes) identified as 
significant by BWA and in the full 
TopHat DET list. We assess the two 
methods based on a comparison of qRT- 
PCR data for the genes detected by 
either or both methods. The discrepancy 
between the results can be attributed to 
differences in the input annotation files 
used (UCSC refFlat versus Ensembl 
GTF) by the two methods and their 
alignment algorithms. 



A comparison of RNA-seq and qRT-PCR analysis for DEGs: 
Based on the RNA-seq data from the WT and Nrl~'~ retinas, 
we selected 25 DEGs (12 downregulated and 13 upregulated) 
showing a wide range of differential expression for validation 
with qRT-PCR analysis. qRT-PCR data for all genes 
validated the RNA-seq results (Figure 4). The WNT1 
inducible signaling pathway protein 1 (Wispl) TaqMan assay 
did not produce an amplicon in any of the experiments 
performed; subsequent examination of the RNA-seq data 
revealed that this assay did not correspond to the splice variant 
expressed in the retina. Additional analysis using a SYBR 
assay with oligonucleotides specific to the retinal splice 
variant confirmed the differential expression of Wispl (-43.9 
fold change) in the Nrl~'~ retina compared to the WT. 
Expression levels of transcripts in the WT and Nrl~'~ retina: 
The preceding analysis clearly demonstrates the high 
reliability and accuracy of the data obtained with RNA-seq 
methodology. We therefore used RNA-seq data to derive 
absolute expression levels of individual transcripts. The top 
25 genes highly expressed in the WT orNrt^ retina are listed 
in Table 6 and Table 7. As predicted, most of these genes 
encode proteins involved in photoreceptor function/ 
metabolism. 

Rod and cone photoreceptor enriched genes: We then focused 
on DEGs between the Nrl~'~~ and WT retinas. A total of 1,422 
transcripts, corresponding to 1,218 unique genes, showed a 



minimum fold change of 1 . 5 at p<0.05 . Hierarchical clustering 
of all differentially expressed transcripts resulted in two 
distinct clusters: one cluster of 477 genes downregulated in 
the Nrl~'~ retina includes all known rod-specific genes such as 
rhodopsin (Rho; FC=-4,804), guanine nucleotide binding 
protein, alpha transducing 1 (Gnatl; FC=-2,034), and nuclear 
receptor subfamily 2, group E, member 3 (Nr2e3; FC=-227.5; 
Figure 5 A and Table 8); and the other cluster of 741 
upregulated genes had all cone-specific genes such as opsin 
1, short-wave-sensitive (Opnlsw; FC=18.4), cyclic 
nucleotide gated channel beta 3 (Cngb3; FC=18.1), and 
Gnat2 (FC=12.2; Figure 5B and Table 9). 

We then compared our DEG list with two published 
studies that examined WT and Nrl~ f ~ retinas: a recent 
transcript-level RNA-seq analysis that included 6,123 DETs 
[54] and a gene-level microarray analysis showing 438 DEGs 
[38] (Figure 6). To obtain the list of DEGs from the Mustafi 
et al. [55] data set, we performed ANOVA on their FPKM 
data from GEO database. Interestingly, the DEGs lists from 
the three studies had only 203 common genes including many 
previously identified genes specifically expressed in cone 
(fatty acid binding protein 7, brain [Fabp7], Opnlsw, 
Cngb3, and Gnat!) or rod (Rho, Gnatl, cyclic nucleotide 
gated channel alpha 1 [Cngal], and Nr2e3) photoreceptors. 
To assess the power of RNA-seq to more comprehensively 
identify DETs than microarray, we examined the list of 634 
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genes identified in common by the RNA-seq studies but not 
by the microarray study. This list included 1 8 retinal disease 
genes (ATP-binding cassette, sub-family A (ABC1), member 
4 [Abca4], cadherin 23 (otocadherin) [Cdh23], ADP- 
ribosylation factor-like 6 \Arl6\, Bardet-Biedl syndrome 9 
(human) [Bbs9], calcium binding protein 4 [Cabp4], cyclic 
nucleotide gated channel alpha 3 [Cnga3], G protein-coupled 
receptor 98 [Gpr98], guanylate cyclase activator la (retina) 
[Gucala], opsin 1 (cone pigments), medium-wave-sensitive 
(color blindness, deutan) [Opnlmw], orthodenticle homolog 
2 (Drosophila) [Otx2], phosphodiesterase 6G, cGMP- 
specific, rod, gamma [Pde6g], peripherin 2 [Prph2], retinol 
binding protein 4, plasma [Rbp4], retinol dehydrogenase 1 (all 
trans) [Rdhl], regulator of G-protein signaling 9 binding 
protein [Rgs9bp], unc-119 homolog (C. elegans) [Unci 19], 
Usher syndrome 2A (autosomal recessive, mild) homolog 
(human) [Ush2a], and whirlin [Whrn]) and several known 
genes involved in visual perception (guanylate cyclase 2e 
[Gucy2e], guanylate cyclase 2f [Gucy2f\, recoverin [Rcvrn], 
RAR-related orphan receptor beta [Rorb], and sal-like 3 
(Drosophila) [Sall3]). Several genes showing large 
differential expression values might participate in rod 
homeostasis (galactosidase, beta 1-like 2 [Glbll2] FC=- 14.02, 
GRAM domain containing 2 [Gramd2] FC=-14.0, 
carbohydrate (chondroitin 6/keratan) sulfotransferase 3 
[Chst3] FC=-4.8, desert hedgehog [Dhh] FC=-4.1, and ADP- 
ribosylation factor-like 4D [Arl4d] FC=-3.6) and cone 
function (dual specificity phosphatase 23 [Dusp23] FC=6.3, 
cyclin-dependent kinase 11B [Cdkll] FC=6.1, tryptophan 
hydroxylase 1 [Tphl] FC=4.7, muscle glycogen 
phosphorylase [Pygm] FC=4.6, cyclin-dependent kinase 6 



Figure 3. Correlation of RNA-seq and 
qRT-PCR. The correlations between 
the RNA-seq fragments per kilobase of 
exon model per million mapped reads 
(FPKM) values and the corresponding 
qRT-PCR crossing threshold (Ct) 
values are shown. FPKM values 
represented in log2 scale, and non- 
normalized Ct values are an average of 
three biologic replicates. Data generated 
from differentially expressed genes 
(black) is contrasted with data generated 
from the housekeeping genes (red). The 
dashed line, associated equation, and 
goodness of fit value were generated by 
least-squares regression analysis of the 
differentially expressed data set. Since a 
lower Ct value indicates an increased 
_i initial amount of target mRNA, an 
1 5 inverse relationship between FPKM and 
Ct values is expected if a correlation 
exists. 



[Cdk6] FC=4.0, Sall3 FC=3.9, and early growth response 1 
[Egrl] FC=3.7). 

Our RNA-seq data allowed us to identify 359 genes not 
identified in previous investigations. To further assess the 
quality of our analysis, we performed qRT-PCR validation of 
15 genes identified by other studies (but not in our study) as 
differentially expressed and of 7 genes uniquely identified by 
our study (but not by other studies; Table 10). Of the 15 genes 
identified by other studies, only three (ATP-binding cassette, 
sub-family A (ABC1), member 13 [Abcal3], CD8 antigen, 
alpha chain [Cd8a], and acyl-CoA oxidase-like [Acoxl]) were 
confirmed with qRT-PCR as being differentially expressed. 
We also detected these three as differentially expressed but 
had filtered them out because of FPKM values that were less 
than 1.0 in all samples. Interestingly, the Abcal3 transcript 
detected in the retina had only sequence reads for exons 56 
through 62. This finding was supported by qRT-PCR using 
two SYBR assays designed to exons 53/55 and exons 56/58. 
All seven genes uniquely identified by our study were 
validated as significantly differentially expressed. 

The significantly lower number of DETs detected by our 
study compared to the Mustafi et al. study (20 1 1 ; 1 ,422 versus 
6,123, respectively) can be attributed to the following: 

1. We used a stringent 1.0 FPKM cutoff that generated a 
list of genes with significant base level expression and fewer 
false positives than a lower expression level threshold. If we 
had decreased our threshold to 0.1 FPKM, we would have 
detected 975 more DETs; however, these genes are expressed 
at an extremely low level and their impact must be weighed 
against the increase in false positives. We chose a 
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conservative criterion to identify significant and bona fide 
differentially expressed genes. 

2. Mustafi et al. [54] pooled multiple RNA samples 
before generating the library and used the identical library on 
multiple lanes of the sequencer. Our experimental design 
consisted of libraries generated from individual biologic 
replicates that allowed us to eliminate the transcripts based on 
p-value. 

Several DETs we identified might contribute to 
photoreceptor function but are not yet characterized; these 
include pleckstrin homology domain containing, family F 
(with FYVE domain) member 2 (Plekhjl; FO-5.35), kelch- 
like 13 (Drosophila) [KIM3] (FO-3.3), NIPA-like domain 
containing 1 (Nipall; FC=-2.8), and coiled-coil domain 
containing 24 (Ccdc24; FC=-2.6) in the WT retina, and kelch- 
like 33 (Drosophila) [KM33] (F014), WSC domain 
containing 2 (Wscd2; FC=4), hairless (Hr; F03.9) and 
regulator of G-protein signaling 22 (Rgs22; F03.8) in the 
NrF'~ retina. We also identified Crx opposite strand transcript 
1 (Crxosl; FC=4. 1), which is exclusively expressed in the eye 
from the opposite strand of a key retinal transcription factor, 
cone-rod homeobox containing gene (Crx) [55]. An 
interesting new finding is the retinal expression of multiple 
genes from the Kelch-like family (Klhl3, 4, 5, 18, 33, 36), 
solute carrier family (>30 members), and zinc-finger protein 
family (>10 members). Mutations in at least one gene from 
each family have previously been associated with retinal 
disease: Klhl7 with autosomal dominant RP [56], Slc24Al 
with autosomal-recessive congenital stationary night 
blindness [57], and Znf513 with autosomal-recessive retinitis 
pigmentosa (RP) [58]. 



Figure 4. qRT-PCR validation of RNA- 
seq results. Comparison of differential 
expression values determined by RNA- 
seq (dark gray) and qRT-PCR (light 
gray) for 25 differentially expressed 
genes identified by Burrows-Wheeler 
Aligner (BWA) workflow. Error bars 
represent the standard error of the mean. 
Neural retina leucine zipper gene (AW) 
was not detectable by qRT-PCR and 
therefore are left blank in the graph. 
Note that Rhodopsin (Rho), guanine 
nucleotide binding protein, alpha 
transducing 1 (Gnatl), cyclic nucleotide 
gated channel alpha 1 (Cngal), and 
nuclear receptor subfamily 2, group E, 
member 3 (Nr2e3) having average 
crossing threshold (Ct) values greater 
than 30 in the Nrl~'~ samples are 
considered extremely low to non- 
expressing. 

DISCUSSION 

Specific patterns of gene expression define the morphology 
and function of distinct cell types and tissues. Changes in gene 
expression are associated with complex biologic processes, 
including development, aging, and disease pathogenesis. 
Until recently, such investigations focused on one or a few 
genes at a time. Advances in genomic technology have 
permitted simultaneous evaluation of most, if not all, genes 
that respond to an extrinsic microenvironment or intrinsic 
biologic program(s). Such studies are critical for delineating 
gene networks that can be targeted for treating specific 
diseases. RNA-seq allows comprehensive evaluation of 
transcriptomes, alternative transcripts, and coding 
polymorphisms. However, analyzing RNA-seq data has been 
challenging due to the complexity associated with quality 
control, sequence alignments, and handling of large data sets 
[59]. Several algorithms [45,60] have been proposed for 
mapping sequence reads to the reference genome, and 
multiple workflows [16,50] suggested for RNA-seq data 
analysis. Here, we report a detailed RNA-seq methodology 
using WT and AW _/ ~ retinas as a study paradigm and establish 
the high performance of NGS technology compared to 
microarray and qRT-PCR platforms for transcript 
identification and quantification studies. Consistent with 
recent studies [61], our RNA-seq data demonstrate high 
sensitivity, a wider dynamic range of coverage, and lower 
technical variability. 

Quantitative RT-PCR has long been considered the "gold 
standard" for mRNA quantification [62,63], and hence 
routinely used to validate results from transcriptome analysis 
studies. We show that FPKM values from RNA-seq analysis 
have a strong linear correlation across at least four orders of 
magnitude with Ct values from qRT-PCR. Expression of 



3043 



Moi 



ular Vision 2011; 17:3034-3054 <http://www.molvis.org/molvis/vl7/a327> 



) 201 1 Molecular Vision 



r— t ; so t-H r-; co co 

r~ on on d od 

ID »1 N O ^ 

co in cn ^ in 



r\ H — < 

t> ON 06 ^ O O 

O O on so in °S oo 
nu 



0N '*ND 0 C | CN| .ND<N rt on 1 . 
^^m^coOoN^^ rt O 



2 

0. m 



00 ON 
NO 



00 m -i O0 VD h 

M vi vi 0\ a\ n t 

^ 00 -i 00 * o 

on oo so so so 



o ^ 

00 fN 



«n «n 
>n >n 



oo-nififnn)-iiO-> 

t^rHcnt^O>t^^ttS ON O 

oor^coONDNDsoin 



< 

o 

D 



ft 
>, 
O 
ft 



SO 

a 



03 

■a 



a 



H 2 

» o 

0 u 



o 
H 



.S 



.s § 

I a 

O .a 
T3 S 
O 



-a ' 
o 



ft 



.a ^ 

B O 



CB 
00! 
r; 

— H 

g oo 

5 -S 

5 2 

too ft 

u bo 

<° a 

o> ~ 

3 ■§ 

O 42 



-a 
c 



3 



an 



a 
'ft 

g g) 

g O 

3 B 

a o 



03 

B 



a 

a ^ 







ac 


a 




'a 


■/:. 


o 


_« 








>, 


ca 


O 



<D <D 

2 ft 

toil .ST 
N 
<D 

o3 



ft Q 

m 

.S ON 

2 a 
S -S 

ft o 

a 1 ft 



<U 



O 
-CO 
ft 

+n 
O 

-co 
ft 
ft 

s 



"3 
ft 

ca- 1 

to 

c ft 

I < 
ft + 



ft 

Oft ft 
T3 A" 

o o 

•h ft 

,H oft 
o 

ft o 



2 t! 



ft 5? 
'£ 2 

00. J3 

ft ft 



Kj O 

.9 -5 



o 

003 

■t3 rt -a 

i o o 

P ft 



> 

O 03 



03 



s 

- - o S 
ft ft O 



s 

I 2 

I s 

a £ 



O <f ft 

to « 55 

o f3 H 

03 ! 

03 < 



O on 



ft » 

e| 

| 3 
2 o 

OJ 03 

« 

o o> 

J3 S 

ft 'a 
o 3 

ft O 



9 

CI 

a 

a 

o 



o 
ft 

0ft 



to 

03 

'-3 

.a 
j3 



03 

o 



o 
ft 



30 
03 



03 



50 
03 



ft 



30 
03 

.a 

j2 



M o *-h r-r; 
Z 

r3 
ft 

H 



03 

03 

CD 
03 

'2 

03 

o 



to a, a, qo; 



a: &3 ts 



2 s <N 



a, 



i— "Q 



-CO 



■n 

« !!§ ^ 'a. "S -5 
a ^3 OS «" «< 



&2 a, 



ON] ^ 

so ~^ v£ ~^ v£ 



to 



Sf 

to 

























o 






't 


r- 














NO 




























rn 






r- 












































r- 






o 


o 














o 






cn 


o 


oo 


in 


CO 


in 


oo 


so 


oc 


oo 


On 


On 


On 




NO 


o 


oc 


NO 


oc 


in 




NO 


o 


0N| 




oc 


't 


on 


SO 


r- 






t- 


in; 


en 


On 


in; 


t- 


CO 


CO 


NO 


0N| 


oc 


in 




0N| 


O 


NO 


It 




on 




OS 


o 


o 


r- 




SO 




o 


O 




O 










o 


CO 




ON 


OC 






CO 




oo 


oo 


0N| 


ON 


in 


OS 






0> 






NO 


oc 








NO 


NO 


CO 




oc 




oc 


o 




o 


o 




o 




o 




0N| 


o 




o 




o 


o 


o 






CS 






o 


o 


o 






o 


o 


o 


o 


o 


o 


o 


o 


o 


o 


o 




o 


o 


o 


o 




o 


O 




o 


o 


o 


o 



T3 
CO 



CI 
ft 

a u 
ft 
o= £ 

u T3 

°l 

°^ 

1 a 
la 

li 

° o 

90 CO 
3 og 
ft *H 

to o 
5 3 

♦3 to 
O 

J3 2 
ft o 

02 43 
<f ft 



3044 



Molecular Vision 2011; 17:3034-3054 <http://www.molvis.org/molvis/vl7/a327> 



) 201 1 Molecular Vision 



L . ^ oo *0 *>0 *^^*^mfirfi m m m c« 



cn 

oo 
oo 



vo ~ ~. 1^ 

^ - s - s 



<^ ^° <^ °° ^ ~ „, ^ 



3 ° 



n n ^= ^ 

t " ri e 



-a 
■a 



o 

CO. 



o 
p. 



p 

bO 

.9 .9 
'G £ 

o <L> 
&3 
00 o 



ba 
9 

T3 



'S 8, 

& oo 
u 5 



blj 

-a 

o 



oo 



o a 

to 



O Dh 

Td O 

H .3 O 

9 n 



00 

r-H Q> 



o 

00 o, > 



O 
blj 



s 3 W 



-a a 



H + 



2 B 



2 tS '5 



0 C m Z 



o o 



c c 



3 E 
O < 



£l 83 B 



c o 

(Z3 Dh 



2 ■o 

|f 
■a a, 

^H TO 

>> C 

3 >> 
< (Z3 



B 

re aj 

e | 

o .5 
u 

M B 

Ik g 



.2 o 



B S 



>i ED 
X C 

o '3 



d 2 

O O 



2 2 u 3 ■ 



U a 

8 '3 

<D h^ 

3 



s « 

2 a 
o .2 

oo g> 



o e 



e o o 

5? 3 ^ 



o 

(Z3 Dh 



2 

o a 



S s 



~H (V, 

K ft, « -ft 

- a, .sr s k 

O esi O O 



^ to a, to <i 



a, s s: 



O co i, ^ cb tq 



-h o in oo »n oo »n 

^J-^J-rOOON^H(N^O 



^in^O\0\^-H«nOMM^CO\0 

m^tooooNON(Nt^r^tnm<nt^mo 



H Z 



oooooooooooooooo 



I I I I I I I I I I I I I I I I I I I I I I I I 
zzzzzzzzzzzzzzzzzzzzzzzz 



-a 



2 

o 

c 



so 
-a 



to 



G 
S 

Ch 

C 

3 I 

o " 
"o ^ 

VP II 

0 f-H 
O S 

1 3 

1 1 

o o 

in C 

Is 

5 a 
o 8 
° 2 
■S 9 

6 o 

<! a, 



3045 



Molecular Vision 2011; 17:3034-3054 <http://www.molvis.org/molvis/vl7/a327> 



© 201 1 Molecular Vision 



r 




|SU3d3 

|Gucy2£ 
Iwispl 
Ipiekha2 
IRho 
|Ntl 
IGnatl 
|Kcn)14 
iTcpll 
|HaEveld3 
|Glbll3 
|Ccaad2 
|Edn2 
I Cngal 
|Ni2e3 

2 an 
|Hdc 
|Bstl 
Tn£E3f8 
E230008H13Rik 
Wdl63 
Sh2dla 
Nipall 
Tepp 
Faal60al 
He£2c 
Esrcb 
Cngbl 
Ptptu 
Slc24al 
Aqpl 
Ccdc87 
Hpdl 
Fscn2 
Rlbc2 
Lrcc2 

Mb 

Chst3 
Ttc21a 
Unc9b 
E*ld2 

2610034H16Rlk 
Reep6 
Ppaigclb 
Th£alp3 
Rn£207 
St3gall 

cibii: 

C P B 

Ppap2c 
Trim36 
Ebpl 
Ppaca 
Scube3 
Stkl7b 
Clt28d2 
Pdia5 
Col4a3 
Slc7a7 
Pol«2 
Ncoa4 
Hspalb 
rndcl 
Ccdc96 
LEZkl 
I ynlib: 
Cldn23 
Pcss38 
Kcemen2 
D2Ettd7S0e 
Dleu7 
Ls»2 

1700007K13Rlk 
SXcl7a3 
Cenpa 
HI CI 
Hltoo 
Hmgb2 
Fa»70b 
Coll9al 
ra»64a 
Ptpn20 

4932418E24Rlk 
HcalO 

Col5a3 



B 



1 2 3 4 5 6 




Cnga3 
Hoxdl 
Cngb3 
Otop3 
Rgl3 
Cdkll 
Egaa-lc 
Gulo 
Ptpnl3 
Elk3 
Klhl4 
Siipdl3a 
Clca3 
Dusp23 
Uwcl 
Rgs22 
Socs3 
Cckbr 
Plec 

■ Chrnb4 
■ Sall3 

Klhl33 

Frr,-; 

Cdk6 

TiuetulSt; 
Agpat9 
Abcb4 
B4galnt2 
Crb3 
Abi3 
A£id3c 
BendS 
Tbcld2 
Hist2h2ab 
Rxcg 
Smox 
Antxc2 
Fcamr 
Ccl9 

■ Hr6al 

■ Elovl7 
lRnase4 

■ Nphsl 
|Hct:C£l 

6330512H04Rik 
Hap3k8 
Fcrls 
Ccdcl55 
Cd37 
Rbp4 
Tcea3 
Fabp7 
Pnp2 
Adrb2 
Spintl 
Albg 
Egr3 

2310030G06Rlk 
Rdhl 
Wdr95 
Pthlr 
Abcblb 
Rerg 
H2-06 



Figure 5. Heatmaps and hierarchical clusters of differentially expressed rod-specific genes and cone-specific genes or those involved in retinal 
remodeling. Heatmaps with dendrograms of clusters of differentially expressed rod genes (A) and cone / retinal remodeling genes (B) by 
applying hierarchical clustering. A filtered list of mRNA transcript isoforms was further revised for fold change >1 .5 and p-value <0.05, and 
duplicate gene symbol rows were deleted to retain the most expressed isoform as reflective of the gene. This list was used to generate the 
heatmap and the master cluster. Specific clusters of rod specific genes and cone-specific or retinal remodeling genes were identified as clusters 
containing known rod genes (e.g., Rhodopsin [Rho], guanine nucleotide binding protein, alpha transducing 1 [Gnatl], cyclic nucleotide gated 
channel alpha 1 [Cngal], and nuclear receptor subfamily 2, group E, member 3 [Nr2e3]) and cone genes (e.g., fatty acid binding protein 7, 
brain [Fabp7], cyclic nucleotide gated channel alpha 3 [Cnga3], cyclic nucleotide gated channel beta 3 [Cngb3]). Columns 1, 2, and 3 are 
wild-type samples, and columns 4, 5, and 6 are Nrt'~ samples. 3046 
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several HKGs is underestimated by RNA-seq because of the 
algorithmic limitation associated with alignment of reads that 
map to multiple genomic locations (paralogous sequences or 
pseudogenes). All of the outlying HKGs inspected had a 
lower-than-projected FPKM value due to varying numbers of 
associated pseudogenes [64-67]. For example, Gapdh has 33 1 
pseudogenes in the mouse genome [64]. Our qRT-PCR data 
projected an FPKM value of approximately 4000 for Gapdh, 
yet the BWA workflow estimated an FPKM of only 6.6 in the 
WT retina (see Figure 3). This was also the case, but less 
severe, for Pgkl, Rps26, Rpll3a, and ActB. Current 
algorithms proportionally divide the number of reads aligning 
to multiple genes during FPKM calculation among those 
genes. In our study, unsuitable qRT-PCR assay design 
explains the remaining exceptions to the linear correlation 
between qRT-PCR and RNA-seq. After careful visual 
inspection of the aligned reads in IGV, we found that the assay 
designed for Wispl was not specific to the splice variant 
expressed in the retina. Similarly, the assays designed for 
Cadm3 and Ubc were specific to one of the two transcripts 
expressed in the retina. Hence, RNA-seq provides a better 



Figure 6. Comparison of the current and 
previous data sets of differentially 
expressed transcripts of Nrl~ h versus 
wild type (WT) mouse retina. Overlap 
between the differentially expressed 
transcripts (DETs) identified in the 
current study and previous studies using 
mouse retinas from the same age and 
genotype was determined using the 
Mouse Genome Informatics (MGI) 
gene symbol as the identifier. The 
current study includes all mRNA 
transcripts identified with the Burrows- 
Wheeler Aligner (BWA) workflow 
(fold change >1.5 and p value <0.05). 
The 438 DETs of an Affymetrix 
microarray study [38] and 6,123 DETs 
from another RNA-seq study [54] with 
similar criteria were used for 
comparison with our study. Of the 438 
DETs from the Corbo et al. [38] study, 
157 transcripts are not significantly 
differentially expressed in our data, 1 1 
are expressed below the fragments per 
kilobase of exon model per million 
mapped reads (FPKM) detection 
threshold of 1.0, and 38 do not map to 
the current annotations. Of the 6,123 
DETs from the Mustafi et al. [54] study, 
4,858 transcripts are not significantly 
differentially expressed in our study, 
348 are expressed below our FPKM 
detection threshold of 1 .0, and 62 do not 
map to the current annotations. 



assessment of alternate iso forms, and transcript quantification 
is not limited by the design of qRT-PCR assays. 

We took advantage of the RNA-seq data to inspect the 
expression of commonly used HKGs (see Table 5) for 
normalization in qRT-PCR assays. The choice generally 
depends on specific tissue and/or developmental time points 
being investigated. Our RNA-seq studies suggest that most of 
the HKGs can be used for normalization calculation in qRT- 
PCR assays; however, Gapdh, [3-2 microglobulin (B2m), and 
Ubc do not appear to be good choices. Additional RNA-seq 
data would help in delineating relevant HKGs appropriate for 
qPCR validation in developing retina or cell types. 

We compared two different strategies for analyzing WT 
and Nrl~ h RNA-seq data. The BWA workflow relies on fast 
and accurate gapped alignment of reads to the exonic regions 
of the genome. Since the gap between most adjacent exons is 
larger than a few bases, the cumulative gap extension penalty 
underestimates the quality of the alignment of reads spanning 
the splice junctions. Hence, the BWA workflow produced 
accurate quantitative estimation of gene and transcript 
isoform expression while losing valuable information about 
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alternate splicing. The higher accuracy of quantitative gene 
expression estimates by the BWA workflow compared to 
those by TopHat is evident from the stronger correlation 
determined by linear regression analysis of the DETs. The 
regression line from BWA had a slope of -1.056 (compared 
to -0.905 for TopHat) and R 2 of 0.8798 (compared to 0.7727 
for TopHat). 

The TopHat workflow maps the reads to exonic regions 
of the reference genome as well as across all known and 
putative splice junctions defined in the Ensembl GTF file. 
TopHat attempts to map reads across splice junctions defined 
in the Ensembl GTF file and across novel splice junctions 
detected during the first phase of alignment. Hence, the 
TopHat workflow maps significantly more reads starting with 
the same number of pass filter (PF) reads and detects 
additional transcript isoforms missed by the BWA workflow. 
The source of genomic annotations used by these methods is 
another important difference. UCSC refFlat annotation (used 
by the BWA workflow) for the mouse reference genome 
(build mm9) contained approximately 28,000 unique 
transcript isoforms, whereas the Ensembl GTF file (used by 
the TopHat workflow) for the same genome build listed three 
times more unique transcript isoforms. The problem is 
amplified because of the lack of one-to-one mapping for 
several transcripts defined in the UCSC refFlat file in Ensembl 
GTF. Hence, a non-trivial number of DETs detected by the 
BWA workflow could not be mapped to any DET from the 
TopHat workflow (see Figure 2, regions shaded in green). 

The BWA workflow detects about 16,000 transcripts in 
the retina, with a minimum expression equivalent to one 
transcript per cell (i.e., 1 FPKM) [16]. When the criteria were 
relaxed to cover transcripts expressed at low levels (0.1 
FPKM), 20,707 transcripts were detected in the retina. This is 
not surprising as the whole retina includes more than 50 
distinct neuronal cell types, and each cell would achieve 
protein diversity largely by alternative promoter usage and/or 
alternative splicing [68]. The TopHat workflow yields 
thousands of known and putative transcript isoforms not 
previously described in the retina. However, validating these 
novel isoforms predicted from RNA-seq data remains a 
challenge. 

Integrated analysis of RNA-seq data with miRNA-seq, 
transcription factor binding sites data (chromatin 
immunoprecipitation sequencing-Chip-Seq), genetic 
variations (expression Quantitative Trait Loci) [69], and 
methylation patterns would allow decoding of the complex 
regulatory networks associated with retinal development and 
function. Several technical improvements would however be 
necessary to overcome the bias introduced into the RNA-seq 
data due to GC content, mappability of reads, length of the 
gene, and regional differences due to local sequence structure 
[70]. RNA-seq methods are more likely to identify longer 
differentially expressed transcripts than shorter transcripts 
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with the same effect size [71]. New statistical methods are 
being developed to correct for systematic biases inherent in 
NGS data [70-72]. In the coming years, we will witness an 
explosion in high throughput genomic methods that are 
expected to revolutionize biology and biomedical discovery. 
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